System and method for capture of change data from distributed data sources, for use with heterogeneous targets

ABSTRACT

In accordance with an embodiment, described herein is a system and method for capture of change data from a distributed data source system, for example a distributed database or a distributed data stream, and preparation of a canonical format output, for use with one or more heterogeneous targets, for example a database or message queue. The change data capture system can include support for features such as distributed source topology-awareness, initial load, deduplication, and recovery. A technical purpose of the systems and methods described herein includes determination and communication of changes performed to data at a distributed data source that includes a large amount of data across a plurality of nodes, to one or more target computer systems.

CLAIM OF PRIORITY AND CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application titled“SYSTEM AND METHOD FOR CAPTURE OF CHANGE DATA FROM DISTRIBUTED DATASOURCES, FOR USE WITH HETEROGENEOUS TARGETS”, application Ser. No.16/145,707, filed Sep. 28, 2018; which claims the benefit of priority toU.S. Provisional Patent Application titled “SYSTEM AND METHOD FORCAPTURE OF CHANGE DATA FROM NOSQL DATABASES OR DISTRIBUTED DATA STREAMS,FOR USE WITH HETEROGENEOUS TARGETS”, Application No. 62/566,113, filedSep. 29, 2017; and is related to U.S. patent application titled “MYSQLDATABASE-HETEROGENEOUS LOG BASED REPLICATION”, application Ser. No.13/077,760, filed Mar. 31, 2011, subsequently issued as U.S. Pat. No.8,510,270, which claims the benefit of priority to U.S. ProvisionalPatent Application titled “HETEROGENEOUS LOG BASED REPLICATION FROMDATABASES SUCH AS MYSQL DATABASES”, Application No. 61/368,141, filedJul. 27, 2010; each of which above applications are herein incorporatedby reference.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure, as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

FIELD OF TECHNOLOGY

Embodiments described by the present application relate to capture ofchange data from a distributed data source system, for use with one ormore heterogeneous targets, including support for features such asdistributed source topology-awareness, initial load, deduplication, andrecovery.

BACKGROUND

Organizations may at times need to move data between different databaseenvironments, for example to create a backup of the data, or to enablesharing of the data between different database applications. Datareplication systems help address this need, for example by detecting andreplicating changes to the data in a database table, as a result of rowoperations, rather than copying the entire table and the data therein.Such an approach can be used to synchronize the data in a targetdatabase with the data in a source database.

However, environments that support very large data sets, for example bigdata environments, present challenges related to availability,scalability, and fault-tolerance; and traditional databases or datareplication systems may not scale sufficiently to handle such largeramounts of data. Organizations are increasingly turning to systems thatprovide a distributed data source, for example databases such as ApacheCassandra, Kafka, MongoDB, Oracle NoSQL, or Google Bigtable, to addressthese considerations. These are some examples of the types ofenvironments in which embodiments of the present teachings can be used.

SUMMARY

In accordance with an embodiment, described herein is a system andmethod for capture of change data from a distributed data source system,for example a distributed database or a distributed data stream, andpreparation of a canonical format output, for use with one or moreheterogeneous targets, for example a database or message queue. Thechange data capture system can include support for features such asdistributed source topology-awareness, initial load, deduplication, andrecovery. A technical purpose of the systems and methods describedherein includes determination and communication of changes performed todata at a distributed data source that includes a large amount of dataacross a plurality of nodes, to one or more target computer systems.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a system for capture of change data from adistributed data source, for use with heterogeneous targets, inaccordance with an embodiment.

FIG. 2 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

FIG. 3 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

FIG. 4 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

FIG. 5 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

FIG. 6 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

FIG. 7 illustrates a flowchart of a method for capture of change datafrom a distributed data source, in accordance with an embodiment.

FIG. 8 illustrates a flowchart of a deduplication process, for use withcapture of change data from a distributed data source, in accordancewith an embodiment

FIG. 9 illustrates a flow diagram of a system for capture of change datafrom a distributed data source, including a deduplication process, inaccordance with an embodiment.

FIG. 10 further illustrates a flow diagram of a system for capture ofchange data from a distributed data source, in accordance with anembodiment.

FIG. 11 illustrates a flowchart of a recovery process, for use withcapture of change data from a distributed data source, in accordancewith an embodiment.

FIG. 12 illustrates a flow diagram of a system for capture of changedata from a distributed data source, including a recovery process,accordance with an embodiment.

FIG. 13 illustrates a flow diagram of a system for capture of changedata from a distributed data source, in accordance with anotherembodiment.

FIG. 14 illustrates a flow diagram of a system for capture of changedata from a distributed data source, in accordance with yet anotherembodiment.

FIG. 15 illustrates an example of a recovery scenario for use with asystem for capture of change data from a distributed data source, inaccordance with an embodiment.

FIG. 16 illustrates another example of a recovery scenario for use witha system for capture of change data from a distributed data source, inaccordance with an embodiment.

FIG. 17 illustrates another example of a recovery scenario for use witha system for capture of change data from a distributed data source, inaccordance with an embodiment.

FIG. 18 illustrates another example of a recovery scenario for use witha system for capture of change data from a distributed data source, inaccordance with an embodiment.

DETAILED DESCRIPTION

As described above, organizations may at times need to move data betweendifferent database environments, for example to create a backup of thedata, or to enable sharing of the data between different databaseapplications. Data replication systems help address this need, forexample by detecting and replicating changes to the data in a databasetable, as a result of row operations, rather than copying the entiretable and the data therein. Such an approach can be used to synchronizethe data in a target database with the data in a source database.However, environments that support very large data sets, for example bigdata environments, present challenges related to availability,scalability, and fault-tolerance.

For example, data replication systems which operate to read commit logsor transactions logs, to extract changes in the data source, face thefollowing challenges: source systems may have more than one copy of dataon various nodes; source systems are partition tolerant; some nodes maygo down and not have network connectivity; and source system may haveone or more new nodes added, leading to new data from a differentlocation.

Although an organization may attempt to address this problem using jobswhich connect to the source system, and run queries to pull out entiredata sets or table data across different source and target systems; whenthe data volumes are high, there is significant latency in this datamovement.

Data Capture from Distributed Data Sources

In accordance with an embodiment, described herein is a system andmethod for capture of change data from a distributed data source system,for example a distributed database or a distributed data stream, andpreparation of a canonical format output, for use with one or moreheterogeneous targets, for example a database or message queue. Thechange data capture (CDC) system can include support for features suchas distributed source topology-awareness, initial load, deduplication,and recovery. A technical purpose of the systems and methods describedherein includes determination and communication of changes performed todata at a distributed data source that includes a large amount of dataacross a plurality of nodes, to one or more target computer systems.

In accordance with an embodiment, the change data capture systemincludes support for features such as distributed sourcetopology-awareness, initial load, deduplication, and recovery, whichenables, for example:

Capture of incremental changes from a distributed data source, for usewith heterogeneous targets, for example, databases or message queues.

Automatic deduplication of the data provided by the distributed datasource.

Automatic discovery of the distributed source topology, withconfigurable access to source change trace entity(s); and distributedsource topology-awareness that supports dynamic changes to thedistributed data source, such as nodes being added or removed.

Support for recovery, so that when a node in the distributed data sourcesystem which had been actively providing records becomes unavailable,for example due to failure, a replica node can be selected and a lookupmade for the last record processed by the failed node. The captureprocess itself is tolerant to crash, can recover and automaticallyreposition itself, without introducing any data loss or duplicaterecords.

In accordance with an embodiment, generally described, the change datacapture system operates to:

Discover the distributed source topology of the cluster. The informationabout the location of the distributed source change trace entity(s)(e.g., commit logs, database tables, or message queues), or other loginformation or files on different nodes is retrieved. If the sourcechange trace entity(s) reside on different physical machines, then theprocess can pull the change trace entity(s) over a network connection toindividual nodes. The capture process keeps track of the originationnode of the change trace entity(s); and

Process source change trace entity(s) from every node and enrich adeduplication cache for every record available in the source changetrace entity.

Whenever a new record is processed, the deduplication cache can decideto pass through the record or filter out duplicates. This isadvantageous since generally multi-node systems push multiple copies ofdata from various points or nodes in the cluster.

The deduplication cache is topology-aware, and has the intelligence toaccept data from a node which is possibly alive, and tries to ignoredata from nodes which are likely down due to network failure or machineshutdown.

The capture process can auto-replay missing data records from a replicaincluding, for example, selecting a replica with the most history, whenan active node used for replication goes down. The capture process canalso timestamp order the records read from various source nodes ondifferent time zones.

FIG. 1 illustrates a system for capture of change data from adistributed data source, for use with heterogeneous targets, inaccordance with an embodiment.

As illustrated in FIG. 1 , in accordance with an embodiment, a changedata capture system 100, for example an Oracle GoldenGate environment,which can be provided at a computer or computing environment thatincludes one or more computer resources (e.g., CPU, memory) 102, can beconfigured to capture change data from a distributed data source 110,for use with one or more targets, for example a database or messagequeue.

In accordance with an embodiment, the change data capture system caninclude an extract component 104, for example an Oracle GoldenGateextract component, which can include an extract processor 111; an accessmodule 112, for example an Oracle GoldenGate vendor access module (VAM);an access API 114 that enables communication with the distributed datasource, and a change data capture process manager (CDC process manager)116.

In accordance with an embodiment, the access module can include anaccess thread 120, and a reader thread 122, for use in accessing recordsat the distributed data source.

In accordance with an embodiment, the extract component including itsCDC process manager can perform one or more of a capture process 117,deduplication process 118, and recovery process 119, as described infurther detail below.

In accordance with an embodiment, the distributed data source system,for example a distributed database, distributed data stream, or otherdistributed data source, can include a plurality of nodes, for examplenode 150. At the distributed data source, the data write path 160 caninclude that data is written from memory 162 to disk 172, in the form ofa source change trace entity 174 (e.g., in a Cassandra environment, acommit log), and to a memory representation 164 (e.g., in a Cassandraenvironment, a memory table) that is subsequently flushed 180 to disk,as a stored representation 182.

For example, in a Cassandra environment, data writes are first writtento the node's commit log, and then to a memory table. When the memorytable is full, it is written to disk as a stored string table. Writesare batched in the memory table until it is full, whereupon it isflushed. This allows writes to be performed using a commit log append.Once flushed, the stored string table files are immutable.

In accordance with an embodiment, the change data capture systemperforms automatic discovery of a distributed source topology associatedwith the distributed data source system, and provides access 202 to thesource change trace entity(s), e.g., commit logs, at nodes of thedistributed data source system.

In accordance with an embodiment, the capture process converts thechange data that is read from the distributed data source, into acanonical format output 205, for consumption by the one or more targets.

In accordance with an embodiment, the change data 206 provided as anoutput can optionally be provided as, or include, for example an OracleGoldenGate trail information.

In accordance with an embodiment, the one or more targets can beheterogeneous targets 210, here indicated as target system A 212 andtarget system B 214, examples of which can include one or more of adatabase, message queue, or other target.

In accordance with an embodiment, the deduplication process providesautomatic deduplication of the data provided by the distributed datasource. Additionally, the change data capture system can performautomatic discovery of a distributed source topology associated with thedistributed data source system, and in the event a node becomesunavailable, perform the recovery process that selects a replica node atwhich to obtain records.

In accordance with an embodiment, the access module, for example anOracle GoldenGate vendor access module (VAM), can include a plurality ofevent classes, which can be used to process events associated with asource change trace entity, and reader or processor classes, which canbe used to read and process records associated with a source changetrace entity.

In accordance with an embodiment, the CDC process manager can include,or interact with, a position storage 142, deduplication cache 144, andhistory queue 146.

In accordance with an embodiment, the position storage enables saving ofcheckpoint information, upon receiving a checkpoint complete event,which can include a global recovery positioning information for one ormore nodes, and in environments in which a sequence identifier (ID) isused, a last-used sequence ID.

In accordance with an embodiment, the deduplication cache is builtanytime a unique token (e.g., in a Cassandra environment, a partitiontoken) is detected from an extract record from the source system. Thiswill tie the token and source node address. If a source node(s) isshutdown/crashes, the deduplication cache is updated to remove all thetokens associated with the respective source node. This allows theprocess to accept records with the same token from another sourcereplica node which is alive. On processing such a record, thededuplication cache is enriched again.

In accordance with an embodiment, if a new source node(s) is added tothe distributed data sources, there is a possibility of redistributionof data records to evenly spread the data. This may lead to certainpartitions of data being moved across various nodes. The deduplicationcache needs to be refreshed and any tokens which are not part of anactive node will be purged. This allows the process to accept recordsfor the purged tokens from a new source replica node. On processing sucha record, the deduplication cache is enriched again.

In accordance with an embodiment, every time the deduplication cache ismodified, it is also committed to a persistent storage. The captureprocess will always have access to the persisted deduplication cache,even if the capture process restarts after a crash or shutdown.

In accordance with an embodiment, the history queue can include a set oflast records read from one or more source nodes.

FIG. 2 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

As illustrated in FIG. 2 , in accordance with an embodiment, thedistributed data source system, for example a distributed database,distributed data stream, or other distributed data source system, caninclude a plurality of nodes arranged within a distributed sourcetopology 216, including in this example node A 218 and node B 220, eachof which can be associated with their own source change trace entity219, 221 respectively. The change data capture system can performautomatic discovery of the distributed source topology associated withthe distributed data source system, and provides access to the sourcechange trace entity(s) at the nodes of the distributed data sourcesystem.

FIG. 3 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

As illustrated in FIG. 3 , in accordance with an embodiment, upondetection of duplicate records 222 in the change data, the deduplicationprocess can provide 224 automatic deduplication of the data provided bythe distributed data source.

FIG. 4 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

As illustrated in FIG. 4 , in accordance with an embodiment, the changedata capture system can perform automatic discovery 226 of thedistributed source topology associated with the distributed data sourcesystem, including for example, determine the presence of new nodes, inthis example node N 228, together with its source change trace entity229, within the distributed source topology.

FIG. 5 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

As illustrated in FIG. 5 , in accordance with an embodiment, the changedata capture system can similarly determine if a node becomesunavailable 230, for example node B in this illustration; and respond tothis unavailability using a global recovery positioning information 232.

FIG. 6 further illustrates a system for capture of change data from adistributed data source, in accordance with an embodiment.

As illustrated in FIG. 6 , in accordance with an embodiment, in theevent a node becomes unavailable, the change data capture system canperform a recovery process that uses the recovery position informationto reposition or otherwise select 234 a replica node, in this examplenode N, at which to obtain records.

FIG. 7 illustrates a flowchart of a method for capture of change datafrom a distributed data source, in accordance with an embodiment.

As illustrated in FIG. 7 , in accordance with an embodiment, at step261, access is provided to a distributed data source system, whichincludes a plurality of nodes associated with a distributed sourcetopology, wherein each node is associated with its own source changetrace entity.

At step 262, auto-discovery of the distributed source topology isperformed, and access is provided to the source change trace entity(s)at one or more nodes, for use by a capture process in capturing changedata from the distributed data source

At step 263, upon detection of one or more duplicate records in thechange data associated with the distributed data source, a deduplicationprocess performs an automatic deduplication of the duplicate records.

At step 264, while continuing to perform auto-discovery of thedistributed source topology associated with the distributed data source,a determination can be made as to the presence of one or more new nodes,and/or the unavailability of one or more nodes.

At step 265, in the event a source node is determined as beingunavailable, a recovery process selects a replica node at thedistributed data source, from which to obtain or replay change datarecords

At step 266, captured change data that is read from the distributed datasource, is converted to a canonical format output, for communication toand/or consumption by one or more target systems.

Additional description describing the above steps are also providedbelow, in accordance with various embodiments.

Automatic Discovery of Distributed Source Topology and CaptureComponents

Since distributed data source systems are scalable, they may includemany capture components to look for change data. In accordance with anembodiment, the system can auto-discover end-points of change data (forexample, a commit log on various nodes in a cluster, or tables on targetnodes, or any other source of change data), such that the change datacapture process can accommodate distributed source topology changes inthe distributed system, including when new components are added, orcomponents are removed, such that the different endpoints may changedynamically during runtime.

For example, in accordance with an embodiment that uses Cassandra, theJARs which are shipped with the binary files provide a NodeProbe classwhich can be used to retrieve information about the Cassandra cluster(ring). A NodeProbe instance can establish a connection with any node inthe cluster; and a single connection to a node enables access to all therequired information about the entire cluster.

In accordance with an embodiment, the CDC process manager can registerand listen for changes/events, such as, for example: nodede-commissioned; node was shut down; new node added; node came up(booted); keyspace added/removed/modified; or tableadded/removed/modified.

In accordance with an embodiment, when a node is shut down orde-commissioned from the cluster/ring, the CDC process manager can takethe following actions: clear the deduplication cache to remove all thetokens associated with this node; find the replica nodes of the nodewhich was removed; find the last record read from the node which wentdown; find the matching record in any of the replica records; replayrecords from any replica node which has the maximum record history;update deduplication cache to link the last record's token with the newreplica node; and close communication to the node which went down.

Deduplication Process and Filtering of Data from Multiple Replicas

Distributed systems generally include data redundancy, so that they arehighly available. In accordance with an embodiment, the change datacapture process from such a system can filter out duplicate records, andif needed also apply a timestamp filter.

In accordance with an embodiment, the deduplication process can alsoaccommodate changes in the distributed system distributed sourcetopology. For example, the change capture process can read records fromreplica node/component when an active component/node which was used tofetch the data record goes down. During deduplication, the change datacapture process chooses the data record from a node which feeds the datarecord first. This way the capture process can read data records fromnodes with lower latency.

FIG. 8 illustrates a flowchart of a deduplication process, for use withcapture of change data from a distributed data source, in accordancewith an embodiment.

As illustrated in FIG. 8 , in accordance with an embodiment, at step271, a record is read from a source node.

At step 272, the distribution mechanism (e.g., in a Cassandraenvironment, a partitioner) is read from the source nodes.

At step 273, a token (e.g., in a Cassandra environment, a partitiontoken) is generated from the record using the determined distributionmechanism.

At step 274, the token is determined from the deduplication cache.

At step 275, if the token exists in the deduplication cache, the nodeassociated with this token is fetched from the deduplication cache.

At step 276, if the node in the deduplication cache matches the sourcenode, the record is passed through to the capture process.

At step 277, if the node in the deduplication cache does not match thesource node, then the respective record is a duplicate record from adifferent node; this record is filtered out and is not passed to thecapture process.

At step 278, if the token does not exist in the deduplication cache, thesource node and token are inserted into the deduplication cache; and therecord passed through to the capture process to generate the canonicalformat output.

FIG. 9 illustrates a flow diagram of a system for capture of change datafrom a distributed data source, including a deduplication process, inaccordance with an embodiment.

As illustrated in FIG. 9 , in accordance with an embodiment, thededuplication process 280-302 shown therein can: read a record from anysource node; read the distribution mechanism from the source node;generate a token from the record using the distribution mechanism; andlookup the token from the deduplication cache.

In accordance with an embodiment, if the token exists in thededuplication cache, then the process fetches the node associated withthis token from the deduplication cache.

In accordance with an embodiment, if the node in the deduplication cachematches the source node, then the process passes through the record tothe capture process.

In accordance with an embodiment, if the node in the deduplication cachedoes not match the source node, the respective record is a duplicaterecord from a different node. This record is filtered out and not passedto the capture process.

In accordance with an embodiment, if the token does not exist in thededuplication cache, then the process inserts the source node and tokeninto the deduplication cache; and passes through the record to thecapture process to generate the canonical format output.

For example, in accordance with an embodiment for use with a Cassandraenvironment, the following steps can be performed: When any row for atable is read from the commit log, the CDC process manager generates apartition token for the particular row based on the partition key andalso caches the node address of the origin of this row. The partitionerused to generate the partition token is dynamically fetched from thelive node. Every record in a distributed source is located in apartition (or an address range or in general storage range) within anode. There could be more than one copy of the record on differentnodes. A partition token is generated to indicate a partition withinnodes. The partition token generation is performed using a distributionmechanism (e.g., partitioning) associated with the distributed sourcesystem. The CDC process manager first fetches information about thepartitioning mechanism used by the distributed live source node. The CDCprocess manager generates a partition token from every record read fromthe commit log, and builds a cache with association of partition tokenand node address. When a new row is processed, the CDC process managerchecks the cache for a token match. If the token exists in the CDCprocess manager cache, it checks the origin (node) of source row. If theorigin (node) of the source row matches the node in the cache, this rowdata is passed on. From now on, any new rows for the same partition willbe accepted from this node. If the origin (node) of the source rowdiffers the node in the cache, this row is a duplicate and will befiltered out.

In accordance with an embodiment, there is a possibility that nodes maycrash/shutdown or be de-commissioned. If a node is down, and the CDCprocess manager was in the process of reading rows for a particulartoken from the node which went down, the deduplication cache would havesome invalid entries. In this scenario, the CDC process manager willstart accepting the same row from a different node. This is accomplishedby refreshing the deduplication cache based on the current state of thedistributed source topology.

Additionally if the extract component/capture process is stopped andrestarted, the deduplication cache is rebuilt at startup to avoidduplicates.

FIG. 10 further illustrates a flow diagram of a system for capture ofchange data from a distributed data source, in accordance with anembodiment.

As illustrated in FIG. 10 , in accordance with an embodiment, theprocesses 310-334 shown therein includes that the cache with token andnode mapping are serialized to persistent storage when the extractcomponent/capture process is stopped; and are de-serialized when theextract component/capture process is restarted.

Preparation of Change Data for use with Heterogeneous Targets

In accordance with an embodiment, the change capture process can convertthe data that is read from a distributed system into a canonical formatoutput which can be consumed by any heterogeneous target system. A newtarget can be supported by introducing a pluggable adapter component toread the canonical change capture data and convert it to the targetsystem format.

In accordance with an embodiment, based on the target system, thecanonical format output data record can be transformed to suit thetarget. For example, an INSERT can be applied as an UPSERT on the targetsystem.

In accordance with an embodiment, the canonical format output can alsoembed the information about the component/node in the distributed systemwhere the data was captured. In accordance with an embodiment, forexample, the system can use an Oracle GoldenGate trail format as acanonical format output.

Typically when a client application reads data from a distributed sourcesystem with data redundancy, the client application is only providedwith the data record and not the source node (on the distributed system)of the data record. The data record may be fetched by the distributedsource system client (e.g., in a Cassandra environment, a CQLSH client)from any of the live nodes. In accordance with an embodiment, thedistributed capture application converts the data record into acanonical format output which also has information about the source nodewhich generated this data record.

Change Data Capture Recovery Process

In accordance with an embodiment, when an active node goes down, thecapture process can replay records from another replica nodeautomatically from a history cache of replica nodes. This helps avoidpossibility of data loss when the capture process switches from one nodeto another due to distributed source topology change. Even if thecapture process crashes abruptly (e.g., due to a kill −9 signal), it canrecover and position into the distributed system without data loss.

For example, in accordance with an embodiment that uses Cassandra, whena node which was actively feeding records goes down, aCassandraTopologyManager can select a replica node and lookup the lastrecord which was processed by the node which went down. If there is morethan one replica node with a matching record, the replica with themaximum record history is selected to feed the token found in the lastrecord of the node which was shut down.

In accordance with an embodiment, if a matching record is not found inany of the replicas, again the replica with the maximum record historyis selected and also a warning message may be logged to indicate apossible data loss. A parameter can be provided to control warning orshutdown action (e.g., ABEND extract action) in this scenario.

Distributed source systems may or may not have a global identifier toindicate a unique position to start reading records. In many cases, itis possible to identify a unique position within every node/component inthe distributed system. If the unique positions from all the nodes in adistributed source system are accumulated, this provides a uniqueposition for the entire distributed system.

In accordance with an embodiment, the capture process can restart,recover, and reposition without missing data records, by generating aglobal restart/recover position for the distributed source system.

Distributed source nodes may have various forms of position information.The capture process can build a unique position adapting to the formatof the source node positions.

In accordance with an embodiment, the capture process periodicallycommits the global position into a position storage that enables savingof checkpoint information, including a global recovery positioninginformation for one or more nodes, and in those environments in which asequence identifier (ID) is used, a last-used sequence ID. Committingdata to a persistent storage is an expensive operation, and consideringperformance is important for the capture process, the commit willgenerally happen at a configurable period, for example N seconds or Nminutes. Since the global position is committed at periodic intervals,there is a risk, if the capture process crashes inside the intervalwindow, that the committed global position may not be the latest.

To address this, in accordance with an embodiment, the capture processrecovery mechanism is resilient to recovery without data loss andduplicates even in this case by extracting and merging positioninformation for all the nodes from the canonical output component andthe position storage component. When a source node(s) isshutdown/crashes, the deduplication cache is updated to remove all thetokens associated with the respective source node. This allows theprocess to accept records with the same token from another sourcereplica node which is alive. The selection of the new replica node is asfollows: The process looks up all the live replica nodes. The processkeeps a history cache of the records which were earlier filtered out asduplicates. The process searches the history cache for the last recordfed from the source node in the replica nodes, and selects the replicanode with a matching record. If there is more than one replica node, theprocess chooses the replica node with the maximum number of records inthe history cache. By replaying records from the replica node, theprocess is able to gracefully switch from one node to another nodewithout data loss.

In this manner, the system can handle errors at the data source, orwithin the capture process; and provides resiliency, since the captureprocess just needs to know the location of the trace entity.

As described above, examples of trace entity(s) include, e.g., in aCassandra environment, commit logs. The partitioner used to generate thetoken is dynamically obtained by the reader. Each record in a source canbe located at some address range.

In accordance with an embodiment, there can be more than one copy of arecord on various nodes. For example, to meet the need for highavailability, the same record may be available in three or more nodes.The token can be generated to indicate, for example, a partition withinnodes. The recovery process attempts to position within the distributedsource, using whichever distribution mechanism the distributed sourcealready uses. For example, Cassandra uses Murmur3 as a distributionmechanism.

Once the extract component receives the record, the distributionmechanism can be determined, by which the system can generate the tokenfor every record in the trace and build the deduplication cache.Generally, this means that all of the records from a bucket at a nodewill be read from that same node/bucket, in a somewhat sticky mannerprovided there are no changes in the topology of the distributed sourcesystem (e.g., nodes were not added or removed). In accordance with anembodiment:

If a lookup on the node is successful, the record read is passedthrough.

If the node doesn't match, then we are reading from a new node, andfollow the deduplication process.

If there is no token at all, then this is a new node to add to thededuplication cache.

In accordance with an embodiment, a record itself will not indicatewhich node the data came from. However, the canonical format output(e.g., in a GoldenGate environment, the trail file) will include thatinformation. When the canonical format output is written, the positionstorage must be updated to show which data came from which node. Ifthere is a crash, the extract component can review the canonical formatoutput, to find which nodes contribute to which records, and then use tothe position storage to find out the positions for each node.

In accordance with an embodiment, checkpoints can be written to acheckpoint file such as a JavaScript Object Notation (JSON) file, whichincludes information associated with the checkpoint, and which isupdated on a period basis. A checkpoint event updates the positionstorage accordingly.

In accordance with an embodiment, during the recovery scan, the extractcomponent can determine each position associated with the nodes, usingthe information provided by the canonical format output, to determine aglobal recovery positioning information for the nodes; and can thenrestart with those positions.

In accordance with an embodiment, during recovery, if duplicate recordsare received from the nodes, they are not discarded immediately, but areinstead placed in the history queue for a period of time, for use duringrecovery. A record match can be performed against the history queue todetermine which record to position to, for that token.

FIG. 11 illustrates a flowchart of a recovery process, for use withcapture of change data from a distributed data source, in accordancewith an embodiment.

As illustrated in FIG. 11 , in accordance with an embodiment, at step341, a unique position is identified within each of a plurality of nodesof the distributed data source system; and accumulate the uniquepositions from all the nodes, to provide a global recovery positioninginformation for use with the distributed data source.

At step 342, the capture process periodically commits the globalrecovery positioning information to a position storage.

At step 343, upon determining a source node(s) is shutdown or hascrashed, the deduplication cache is updated to remove all of the tokensassociated with that source node.

At step 344, in some instances, the global recovery positioninginformation is read from the canonical format output.

At step 345, in some instances, the history queue is searched for thelast record fed from the source node in the replica nodes; to select thereplica node with a matching record; or if there is more than onereplica node, choose the replica node with the maximum number of recordsin the history queue; and relaying records from replica node.

FIG. 12 illustrates a flow diagram of a system for capture of changedata from a distributed data source, including a recovery process350-364, in accordance with an embodiment.

Additional Deduplication and Recovery Examples (Cassandra)

As described above, in accordance with an embodiment, the system canperform a deduplication process that provides automatic deduplication ofthe data provided by the distributed data source system.

FIG. 13 illustrates a flow diagram of a system for capture of changedata from a distributed data source, in accordance with anotherembodiment.

As illustrated in FIG. 13 , in accordance with another embodiment, suchas, for example, a Cassandra database, the deduplication process 412-444shown therein can be used to provide automatic deduplication of the dataprovided by the distributed data source.

FIG. 14 illustrates a flow diagram of a system for capture of changedata from a distributed data source, in accordance with yet anotherembodiment.

As illustrated in FIG. 14 , in accordance with yet another embodiment,the deduplication process 462-494 shown therein can be used to provideautomatic deduplication of the data provided by the distributed datasource.

In accordance with other embodiments, other types of distributed datasources or databases, or deduplication processes, can be supported. Theexamples described above are not intended to be restrictive.

Recovery Scenarios

FIGS. 15-18 illustrate examples of recovery scenarios for use with asystem for capture of change data from a distributed data source, inaccordance with various embodiments.

Although in some of the example recovery scenarios described below, therecovery process considers, for purposes of illustration, the use of asequence ID, as might be used, for example, with a Cassandraenvironment. However, the use of such a sequence ID is optional, andwith other types of distributed data sources may not be utilized.

Recovery Scenario 1—Checkpoint Update Event (No Crash/Stop)

In accordance with an embodiment illustrated in FIG. 15 , in thisrecovery scenario, upon receiving a checkpoint complete event, thecheckpoint information is updated in the position storage, and thetokens are serialized to the canonical format output.

Node1 Node3 Node2 Node3 Node1 Node1 Sequence ID 1 2 3 4 5 6 (Checkpoint)Transaction ID Node1: Node3: Node2: Node3: Node1: Node1: Position1Position1 Position1 Position2 Position2 Position3

If, in this example, there is a checkpoint complete event after writingthe record with sequence ID:4 to the canonical format output, thecheckpoint information in position storage will include the followingglobal recovery positioning information: Node1:Position1;Node3:Position2; Node2: Position1.

In environments in which a sequence ID is used, the last-used sequenceID:4 can also be stored in the checkpoint information.

Recovery Scenario 2—Graceful Stop and Restart

In accordance with an embodiment, using the previous example, wheninstead a stop command is issued to the extract component, a checkpointcomplete event on the last record is processed.

In this example, the record with sequence ID:6 will trigger thecheckpoint information to be written to position storage with itssequence ID value updated to 6. The checkpoint information will beupdated with the following positions: Node1:Position3; Node3:Position2;Node2: Position1.

Upon restart, the extract component has all the information in thecheckpoint information, to perform an exact positioning onto each of thenodes.

Recovery Scenario 3—Extract Crash and Restart

In accordance with an embodiment illustrated in FIG. 16 , in thisrecovery scenario, on checkpoint event, the checkpoint informationwritten to position storage has: Sequence ID:2; Node1:Position1;Node3:Position1.

Node1 Node3 Node2 Node3 Node1 Node1 Node3 Sequence ID 1 2 3 4 5 6Partial (Checkpoint) record in canonical format output. Transaction IDNode1: Node3: Node2: Node3: Node1: Node1: Position1 Position1 Position1Position2 Position2 Position3

In accordance with an embodiment, if the extract component is killedabruptly while writing the record with sequence ID:7, then the lastrecord saved in the canonical format output will be a partial record.

In accordance with an embodiment, upon restart, the following actionswill be performed: scan the canonical format output forward from thelast check-pointed position that is the record with the sequence ID:2;accumulate the node positions from the transaction ID token in thecanonical format output; and store the last seen transaction ID tokenper node: Node1:Position3; Node2: Position 1; Node3: Position2.

In this example, during the scan, the last-used sequence ID from thecanonical format output is 6. The extract component will pass on thesequence ID and the node positions, so that the global recoverypositioning information will be: Sequence ID:6; Node1:Position3;Node2:Position1; Node3:Position2.

Recovery Scenario 4—Distributed Source Topology Changes

In accordance with an embodiment illustrated in FIG. 17 , if in thisexample, Node1 is associated with a replica node ReplicaNode1; and acheckpoint event occurs after processing the record with sequence ID:6,then the current position of the extract component will be: SequenceID:6; Node1:Position3; Node2:Position1; Node3: Position2.

Node1 Node3 Node2 Node3 Node1 Node1 Sequence ID 1 2 3 4 5 6 TransactionID Node1: Node3: Node2: Node3: Node1: Node1: Position1 Position1Position1 Position2 Position2 Position3

Node1 Down—Scenario 1

In accordance with an embodiment using the above example, when Node1goes down, the extract component will look for a replica node of Node1.It will find ReplicaNode1, and from now on, all the tokens which werefed from Node1 will now be fed from ReplicaNode1.

In accordance with an embodiment, as part of selection of the newreplica node (ReplicaNode1), the extract component will search itshistory queue for the last record which was fed from Node1, which inthis example, was Node1Position3. Any records which are found in thehistory queue are replayed into the canonical format output file.

If, in this example, the history for ReplicaNode1 is:

ReplicaNode1 ReplicaNode1 ReplicaNode1 ReplicaNode1 ReplicaNode1Sequence ID 4 5 6 7 8 Transaction ID ReplicaNode1: ReplicaNode1:ReplicaNode1: ReplicaNode1: ReplicaNode1. Position 1 Position2 Position3Position4 Position5 (Record match)

Then, in accordance with an embodiment, if the last record from Node1which matched in ReplicaNode1 is the third record, the system willreplay records with positions ReplicaNode1:Position4 andReplicaNode1:Position5. If there was a checkpoint complete event afterprocessing ReplicaNode1:Position5, the positioning information will be:Sequence ID:8; Node1:Position3; Node2:Position1; Node3:Position2;ReplicaNode1:Position5.

In this manner, the extract component can react to the distributedsource topology change without duplication of data. On graceful stop andrestart, new records will be read from nodes, ReplicaNode1, Node2 andNode3. Any records from Node1 are filtered out (even if the Node1 hasbooted).

Node1 Down—Scenario 2

In accordance with an embodiment, using the above example, but insteadassuming a crash before the extract component can checkpoint the recordsfrom ReplicaNode1.

In this example, the same records as above were sent to the extractcomponent from ReplicaNode1; the extract component has crashed; and thecanonical format output has the records ReplicaNode1:Position4 andReplicaNode1:Position5 which were not check-pointed.

In accordance with an embodiment, upon restart, the global recoverypositioning information will be: Sequence ID:8; Node1:Position3;Node2:Position1; Node3:Position2; ReplicaNode1:Position5; which meansthat, if Node1 is down, the records will be fed from ReplicaNode1, andsince we have the position for ReplicaNode1, there will be noduplication of data.

Upon restart, if both Node1 and ReplicaNode1 are up, Node1 wouldcontinue feeding records after position ReplicaNode1:Position5 and therewill be no duplication of data.

Upon restart, if ReplicaNode1 is down and Node1 is up, Node1 will startfeeding the records to the extract component by positioning itself toNode1:Position3. This means that the records which were read earlierfrom ReplicaNode1:Position5 (with sequence ID:7 and ID:8) will remain asduplicate records in the canonical format output.

Node1 Down Scenario 3

In accordance with an embodiment, using the above example, but insteadthat extract component has crashed before records were written to thecanonical format output.

In accordance with an embodiment, upon restart, the extract positioningwould be: Sequence ID:8; Node1:Position3; Node2:Position1;Node3:Position2.

Upon restart, if Node1 is up, there is no duplication of data.

Upon restart, if Node1 is still down, the extract component starts toread from the replica ReplicaNode1, which may lead to duplication ofrecords, if earlier Node1 has fed these records.

Node1 Down—Scenario 4

In accordance with an embodiment illustrated in FIG. 18 , in thisexample ReplicaNode1 does not have enough historical records, forexample, the records in the ReplicaNode1 history queue are:ReplicaNode1:Position6; ReplicaNode1:Position7; ReplicaNode1:Position8;ReplicaNode1:Position9.

There is a potential data loss here, since the history queue forReplicaNode1 is not deep enough. By default, the extract component willABEND in this scenario, warning the customer about the situation.Alternatively, an administrator can choose to turn off the ABEND featureand restart extract.

In accordance with an embodiment, upon restart the positioning will be:Sequence ID:6; Node1:Position3; Node2:Position1; Node3:Position2. IfNode1 is still down, then data will be sourced from ReplicaNode1starting from its first available source change trace entity; whichcould lead to duplication of records. If Node1 is up, there will be noduplication of data.

Node1 Down—Scenario 5

In accordance with an embodiment, in this recovery scenario the extractcomponent has processed some records and the global recovery positioninginformation is: Sequence ID:6; Node1:Position3; Node2:Position1;Node3:Position2.

In accordance with an embodiment, upon restart, if Node1 is down, theextract component will start reading records from the replica node ofNode1 that is ReplicaNode1. Since the position information aboutReplicaNode1 is not available, all the available records fromReplicaNode1 will be read, which could lead to duplication of records inthe canonical format output file.

Example Implementation (Cassandra)

The following section provides, for purposes of illustration, adescription of an example embodiment for capture of change data from adistributed data source system, such as, for example, a Cassandradatabase.

In accordance with other embodiments, other types of distributed datasources or databases can be supported. For purposes of illustration,various details are provided below in order to provide an understandingof various embodiments. However, embodiments can also be practicedwithout specific details. The following description is not intended tobe restrictive.

Cassandra is a massively scalable open source NoSQL database, whichdelivers continuous availability, linear scalability, and operationalsimplicity across many commodity servers with no single point offailure. A Cassandra system addresses the problem of failures byemploying a peer-to-peer distributed system across homogeneous nodes,where data is distributed among all nodes in a cluster (ring).

In accordance with various embodiments, the system can include orutilize some or all of the following features:

Node: hardware to store data, a member of the cluster.

Datacenter: a collection of related nodes. Using separate datacentersprevents transactions from being impacted by other workloads and keepsrequests close to each other for lower latency. Datacenters generally donot span physical locations.

Cluster: a cluster contains one or more datacenters, and can spanphysical locations.

CommitLog: all data is written first to the commit log for durability(write-ahead logging). After all its data has been flushed to SSTables,commit logs can be archived, deleted, or recycled. If the CDC feature isenabled, the archived (CDC) commit logs reside in the pre-configured CDCdirectory (default value: $CASSANDRA_HOME/data/cdc_raw) and the activecommit logs reside in a pre-configured active log directory (defaultvalue: $CASSANDRA_HOME/data/commitlog). Every node has its own copy ofcommit logs.

SSTable: a sorted string table (SSTable) is an immutable data file towhich the system writes memtables periodically. SSTables are append onlyand stored on disk sequentially and maintained for each table.

MemTable: a memtable is a cache residing in memory which has not beenflushed to disk (SSTable) yet.

Gossip: a peer-to-peer communication protocol to discover and sharelocation and state information about other nodes in the ring.

Partitioner: a partitioner determines which node will receive the firstreplica of a piece of data, and how to distribute data across otherreplica nodes in the cluster. A partitioner is a hash function thatderives a partition token from the primary key of a row. TheMurmur3Partitioner is the default partitioner in the latest versions ofCassandra.

Replication factor: the total number of replicas of data to be storedacross the cluster.

Replica placement strategy: Cassandra stores copies (replicas) of dataon multiple nodes to ensure reliability and fault tolerance. Areplication strategy determines which nodes to place replicas on.

Keyspace: a keyspace is similar to a schema in the RDBMS world, and actsas a container for application data. When defining a keyspace, one mustspecify a replication strategy and a replication factor, for example:

-   -   CREATE KEYSPACE ks_rep3    -   WITH REPLICATION={‘class’:‘SimpleStrategy’,        ‘replication_factor’:3};

Snitch: a snitch defines groups of machines into datacenters and racks(the distributed source topology) that the replication strategy uses toplace replicas.

Primary key: a primary key for a table is composed of partition keycolumn(s) and optional clustering key column(s). Partition key columnsand clustering columns may have the responsibility to identify aparticular row in addition to other duties. Partition key columns areresponsible for identifying the nodes where the row needs to fit in. Apartitioner is used to generate a partition token (hash value) based onthe partition key column in the table. This partition token is mapped tocertain node(s) in the cluster. Clustering key columns are used todefine the sort order of the data within a node, for example:

-   -   PRIMARY KEY (col1): Partition key column col1; No clustering key        column.    -   PRIMARY KEY ((col1, col2), col3): Partition key columns [col1,        col2]; Clustering key column [col3]

In accordance with an embodiment, a CDC feature provides a mechanism totag specific tables for CDC (archival/recovery/backup). The feature canbe enabled on a table by setting the table property cdc=true (eitherwhen creating the table or by altering it). Additionally the CDC featuremust be enabled node-wise by setting the property cdc_enabled: true inthe respective node configuration file, cassandra.yaml. Cassandra useswrite-ahead logging to ensure recovery. Any changes to the database arefirst written to a commit log. A copy of the database changes is alsoretained in-memory, as memtable. The database changes are eventuallyflushed out of memtables to SSTables. SSTables are files on disk whichis the persistent database storage.

In accordance with an embodiment, Cassandra writes database changesgrouped as commit log segments into commit logs under the active commitlog directory. When the CDC feature is enabled on the node as well asthe table in context, after Cassandra flushes memtables out to SSTables(disk), the commit log segments in the active commit log are written(archived) to a CDC commit log directory. Reading commit log segmentsfrom the CDC commit log directory and the active log directory willprovide the complete history of database changes. CDC can be enabled ordisabled through the cdc table property, for example:

-   -   CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;    -   ALTER TABLE foo WITH cdc=true;    -   ALTER TABLE foo WITH cdc=false;

The following parameters are available in cassandra.yaml for CDC:

-   -   cdc_enabled (default: false): Enable or disable CDC operations        node-wide.    -   cdc_raw_directory (default: $CASSANDRA_HOME/data/cdc_raw):        Destination for CommitLogSegments to be moved after all        corresponding memtables are flushed.    -   cdc_free_space_in_mb: (default: min of 4096 and one-eighth the        volume space): Calculated as sum of all active CommitLogSegments        that permit CDC+all flushed CDC segments in cdc_raw_directory.    -   cdc_free_space_check_interval_ms (default: 250): When at        capacity, the frequency with which the space taken up by        cdc_raw_directory is limited to prevent burning CPU cycles        unnecessarily. Default is to check 4 times per second.

Cassandra processes data at several stages on the write path, startingwith the immediate logging of a write: logging data in the commit log;writing data to the memTable; flushing data from the memTable; andstoring data on disk in SSTables.

Data in the active commit log is purged after its corresponding data inthe memtable is flushed to an SSTable. Any database operation is firstwritten to the active commit log followed by a copy in the memtable.Data in the active commit log is persisted until the memtable flushes itto SSTable. A memtable flush can occur in the following scenarios:

The parameters memtable_heap_space_in_mb (typically one-quarter that ofthe heap size), memtable_offheap_space_in_mb (typically one-quarter ofthe heap size) and memtable_cleanup_threshold determine the memtableflush frequency.

memtable_heap_space_in_mb: The amount of on-heap memory allocated formemtables.

memtable_offheap_space_in_mb: Sets the total amount of off-heap memoryallocated for memtables.

memtable_cleanup_threshold: Defaults to 1/(memtable_flush_writers+1)

In accordance with an embodiment, Cassandra addsmemtable_heap_space_in_mb to memtable_offheap_space_in_mb and multipliesthe total by memtable_cleanup_threshold to get a space amount in MB.When the total amount of memory used by all non-flushing memtablesexceeds this amount, Cassandra flushes the largest memtable to disk.

A larger value for memtable_cleanup_threshold means larger flushes, lessfrequent flushes and potentially less compaction activity, but also lessconcurrent flush activity, which can make it difficult to keep yourdisks saturated under heavy write load.

The parameter commitlog_segment_size_in_mb (default value of 32)determines the size of individual commit log segment (commit log ondisk). A commitlog segment may be archived (moved to CDC directory),deleted, or recycled after all its data has been flushed to SSTables.This data can potentially include commitlog segments from every table inthe system. A small total commitlog space tends to cause more flushactivity on less-active tables.

Cassandra flushes memtables to disk, creating SSTables when the commitlog space threshold has been exceeded. When the database is restarted,the active commit logs are archived (moved to the CDC directory). Anodetool flush command can used to manually flush the memtables toSSTables triggering commit logs availability in the CDC directory.

The latency with which CDC data becomes available has a known limitationdue to reliance on CommitLogSegments being discarded to have the dataavailable in cdc_raw: if a slowly written table co-habitats aCommitLogSegment with CDC data, the CommitLogSegment won't be flusheduntil the system encounters either memory pressure on memtables orCommitLog limit pressure. Ultimately, this leaves a non-deterministicelement to when data becomes available for CDC consumption unless aconsumer parses live CommitLogSegments.

In accordance with an embodiment, to address this limitation and makesemi-realtime CDC consumption more friendly to end-users, the systemsupports the following:

Consumers parse hard links of active CommitLogSegments in cdc_rawinstead of waiting for flush/discard and file move.

Cassandra stores an offset of the highest-seen CDC mutation (Mutation)in a separate index (idx) file per commit log segment in cdc_raw.Clients tail this index file, delta their local last parsed offset onchange, and parse the corresponding commit log segment using their lastparsed offset as min.

Cassandra flags that index file with an offset and DONE when the file isflushed so clients know when they can clean up.

Capture of Cassandra Database Changes

In accordance with an embodiment, the CDC feature provides a mechanismto archive commit log segments into a CDC directory when databasechanges are flushed out of Memtables (stored in RAM). The table data isgrouped as commit log segments which are serialized into a binary formatand written to commit logs.

Cassandra JARs (Java Archives) which are packaged as part of thedatabase itself provide an interface CommitLogReadHandler which can beimplemented to decode the commit log segments. In accordance with anembodiment, the system can house an implementation of the interfaceCommitLogReadHandler.

In accordance with an embodiment, advantages of this approach includethat reading commit logs for database changes is fast. The captureprocess (the extract component) can read the commit logs in the CDCdirectory or the active log directory and capture database changes withvery minimal load on the Cassandra database server, it is anon-intrusive capture. The capture process will filter out specifictable data when reading the commit logs. This also improves overallcapture performance.

In accordance with an embodiment, the capture process can be configuredto read database changes from archived (CDC) commit logs as well asactive commit logs. Reading active commit logs provides very low latencyfor capture. The capture process can be stopped and restarted to anyvalid offset in the commit log. The capture process can be positionedbased on timestamp or specific commit log segment offsets within thecommit log. The commit logs in the CDC directory are never purged. It isthe responsibility of CDC consumer, the capture process to do thenecessary house-keeping. This means the capture process may be restartedafter extended downtimes and still capture the database changes.

Some considerations for this approach include that commit log readsegments do not tag an INSERT or UPDATE source operation. This means thecapture process would need to write all the source UPDATE operations asINSERTs (or UPSERTs, as described below) into the trail. Commit logsegments do not store TRUNCATE operation. Commit log segments do notstore before image of the changed row. Commit log segments house dataserialized as Mutation objects. If future versions of Cassandra modifythe Mutation object format or APIs, the capture process can be modifiedaccordingly.

Intercepting the Input Query (CQL) at each Node

In accordance with an embodiment, Cassandra JARs packed with theinstallation provide an interface QueryHandler which can be implementedto intercept the input user query. The input query from the user is aCassandra Query Language (CQL) which is similar to SQL.

In accordance with an embodiment, advantages of this approach includethat access to all the input queries (CQL) available. The system shouldalso be able recognize UPDATE and TRUNCATE operations. Any changes toCassandra log segments would not break the functionality of the captureprocess unless the input CQL syntax does not change.

Some considerations for this approach include that the QueryHandler isan additional layer for input query processing of the Cassandra databaseprocess. Any flaw in the QueryHandler could lead to database processcrashes or data corruption. The QueryHandler is an intrusive solutionwhich has to be installed at every node in the cluster. Any latency inthe QueryHandler is passed on to the Cassandra database input queryprocessing engine and eventually to end users. Input CQL do not have anyposition information as commit log segments (in the commit logs) are notavailable in this context. The capture process can build a pseudoposition which may be complex or defer extract positioning capability.If the capture process was down, and restarted any database changesduring the downtime cannot be read.

In accordance with an embodiment, UPDATE operations can be delivered asINSERTS and delivery (replicat) can use[UPDATEINSERTS+INSERTMISSINGUPDATES]. The system writes a new Databasetype name as “CASSANDRA” in the trail which will indicate that theINSERT operations in this trail is an UPSERT operation.

Access to Commit Logs

In accordance with an embodiment, a Cassandra cluster typicallycomprises one more nodes (servers) which together act as one databaseinstance. Commit logs exists on each of the nodes on the cluster. Inaccordance with an embodiment, options to access the commit logsinclude, for example:

Access over Local File System

In accordance with an embodiment, the commit logs access on the machinewhere the extract component is running. This configuration does notinvolve network costs.

Access to Remote Commit Logs over NFS (Network File System) Protocol

In accordance with an embodiment, the commit logs should be madeavailable through a NFS mount on the machine where the extract componentis running. This configuration does incur network bandwidth to read thecommit logs.

Access to Remote Commit Logs through SFTP

In accordance with an embodiment, the required commit logs aretransferred from the remote nodes to the machine where the extractcomponent is running using Secure File Transfer Protocol (SFTP). Thisconfiguration also incurs network costs to transfer the files.

Access to Remote Commit Logs through an Extract-Agent

In accordance with an embodiment, a remote program can be installed oneach of the nodes which would have the context of extract TABLEparameters to filter out the required table data from the commit logs.This filtered commit log data should be transferred over the network tothe machine where the extract component process is running. The extractcomponent can re-assemble the data from the remote program from all thenodes and proceed with the required processing.

Discovery of Nodes

In accordance with an embodiment, the node discovery by the extractcomponent can include, for example:

Static Configuration

In accordance with an embodiment, the administrator needs to provide thefollowing details for every node in the cluster: List of CDC commit logdirectories; List of active commit log directory; and List of Cassandranode addresses. The commit log directories may be a local directory or aremote directory mounted over NFS.

Dynamic Configuration

In accordance with an embodiment, the User/Operator to provide theinformation for just one of the nodes in the cluster. The configurationfor a single node may have meta-fields (like $nodeAddress) in the commitlog directory path to identify individual node address. Automaticallydiscover all the nodes in the cluster.

Cassandra Extract and CDC Process Manager

In accordance with an embodiment, a Cassandra CDC process manager is aJava application which reads, filters and transforms the raw table data(Mutation) in the commit logs. The transformed commit log data isaccessed over Java Native Interface (JNI) by a C++ library. A CassandraVAM can be provided as a C++/C binary (shared library) which is used bythe extract component process to write trail files.

Big Data VAM Module

In accordance with an embodiment, a big data VAM module is a generic VAMmodule proposed to be re-used for all the big data sources. It handlesnon-transactional table data. It is multi-threaded; one thread (VAM APIthread) interacts with the VAM API; the second thread (JNI readerthread) reads operation records from the respective big data sourceusing JNI. The JNI reader thread acts as the producer for operationrecords and VAM API thread consumes these operation records. The genericVAM module uses a factory class to instantiate the specific JNI readerimplementation based on the source database. The capture process usesthe class CassJNIReader to interact with the Java application CDCprocess manager.

CommitLogReadHandler Interface

In accordance with an embodiment, the Cassandra database process writesany changes to the database into the commit logs. Implementing theCommitLogReadHandler interface enables to read the database changes inthe form of Mutation objects.

Decoding Mutation from Commit Logs

In accordance with an embodiment, a class CDCHandlerImpl is theimplementation of CommitLogReadHandler interface. The CDCHandlerImpl isprovided access to Mutation objects constructed from Commit Log readsegments which exist in the commit logs. A classCassandraMutationProcessor has the responsibility of decoding theMutation objects and transforming it to a format which can be easilyaccessed over JNI by the Cassandra VAM library. ACassandraMutationProcessor also generates a partition token for everyMutation record from the partitioner which is currently used by thecluster. The partitioner used by the cluster is retrieved dynamically byusing NodeProbe on any live node.

Processed Mutations

In accordance with an embodiment, raw Mutation records from the commitlog segments are transformed into CassandraCDCMutation object which isthe decoded format of the database operation.

Column Definition

In accordance with an embodiment, the column definition extracted outfrom the Mutation object is stored in a CassandraColumnDefinition class.This class stores the column attributes.

Column Data

In accordance with an embodiment, a class ColumnData encapsulates asingle column's data read from the Mutation object.

CassandraTopologyManager

In accordance with an embodiment, this class performs the followingtasks on behalf of CassandraCDCProcessManager: Listen/react todistributed source topology changes (node state changes) in the ring.Listen/react to schema changes in the ring. Maintain a deduplicationcache to de-dup rows from replicas. Recover from node failure byreplaying records from a history cache. Manage position/checkpointinformation through ClusterPosition. Access keyspace and table metadatathrough SchemaLoader. Instruct the embedded SFTP clientCassandraClusterSFTPClient for remote access of the commit logs.

NodeProbe

In accordance with an embodiment, the JARs which are shipped withCassandra binary (installation tar) provide a NodeProbe class which canbe used to retrieve information about the ring. NodeProbe instance canestablish a connection with any node in the cluster. A single connectionto a node is good enough to access all the required information aboutthe cluster.

Partitioner and Partition Tokens

In accordance with an embodiment, NodeProbe is used to retrieve thepartitioner used by a live node. Every node in the cluster is assigned arange of partition tokens (hash values). When the replication factor ofkeyspaces are greater than one, a partition token value may point tomore than one node in the cluster. The partitioners use a specifichashing algorithm to generate the partition token value. The recentversions of Cassandra uses a Murmur3 partitioner to generate partitiontokens.

Deduplication of Row Data from Replicas

In accordance with an embodiment, when capture is enabled on sourcetables which reside in a keyspace with a replication factor greater thanone, the CDC process manager will be presented with more than one copyof the same row, and can filter out duplicate rows from the replicas.The following are the steps performed by CDC process manager fordeduplication:

In accordance with an embodiment, when any row for a table is read fromthe commit log, CDC process manager generates a partition token for theparticular row based on the partition key and also caches the nodeaddress of the origin of this row. The partitioner used to generate thepartition token is dynamically fetched from the live node.

When a new row is processed, the CDC process manager checks the cachefor partition token match (generated from the row partition key).

If the partition token exists in the CDC process manager cache, itchecks the origin (node) of source row.

If the origin (node) of the source row matches the node in the cache,this row data is passed on. From now on, any new rows for the samepartition key (partition token) will be accepted from this node.

If the origin (node) of the source row differs the node in the cache,this row is a duplicate and will be filtered out.

In accordance with an embodiment, there is a possibility that nodes maycrash/shutdown or can be de-commissioned. If a node is down and the CDCprocess manager was in the process of reading rows for a particularpartition token from the node which went down, the deduplication cachein CDC process manager would have some invalid entries. In thisscenario, CDC process manager will start accepting the same row from adifferent node. This is accomplished by refreshing the deduplicationcache based on the current state of the ring.

Additionally, if the extract component process is stopped and restarted,the deduplication cache is rebuilt at startup to avoid duplicates. Thecache with partition token and node mapping are serialized to a filewhen the extract component process is stopped and de-serialized when theextract component process is restarted.

Record Processing Flow

In accordance with an embodiment, the CDC process manager feeds the rawrecord of type Mutation into CassandraMutationProcessor to generate aCassandraCDCMutation object. In case the mutation has bulk operations,output from CassandraMutationProcessor will be a list ofCassandraCDCMutation records. The processed CassandraCDCMutation recordsthen go through a filtering process: If the extract component waspositioned to a start timestamp, any records with a timestamp smallerthan the start timestamp will be filtered out. If this is a duplicaterecord from a replica node, it will be filtered out.

In accordance with an embodiment, the duplicate records which werefiltered out from replicas are stored into a history cache. The depth ofthe history cache is configurable. The depth is based on timestamp ofthe first and last records in the history queue and also record count.The system can also store another cache to keep track of the lastunfiltered record processed from every node.

Changes in the ring, will lead to partition tokens (partition tokenranges) being shuffled across nodes. Sometimes partition token movementmay not be complete and there could be more changes in the ring. Thereis a possibility that CDC process manager cannot detect the replicas fora particular partition token generated from a record from the commitlog. When this scenario occurs (although occurs very rarely), such arecord is not filtered out. A message is also logged to indicate thepossibility of a duplicate record.

Change Events in the Ring

In accordance with an embodiment, the CassandraTopologyManager willregister and listen for the following changes/events in the ring: Nodede-commissioned. Node was shutdown. New node added. Node came up(booted). Keyspace added/removed/modified. Table added/removed/modified.

Node De-Commissioned/Shutdown Event

In accordance with an embodiment, when a node is shutdown orde-commissioned from the ring, the CassandraTopologyManager would takethe following actions: Clear the deduplication cache to remove all thepartition tokens associated with this node. Find the replica nodes ofthe node which was removed. Find the last record read from the nodewhich went down. Find the matching record in any of the replica records.Replay records from any replica node which has the maximum recordhistory. Update the deduplication cache to link the last record'spartition token with the new replica node. Close SSH connection to thenode which went down.

Replaying Records from Replica Nodes

In accordance with an embodiment, when a node which was actively feedingrecords goes down, the CassandraTopologyManager needs to select areplica node and lookup the last record which was processed by the nodewhich went down. If there is more than one replica node with a matchingrecord, the replica with the maximum record history is selected to feedthe partition token found in the last record of the node which wasshutdown. If a matching record is not found in any of the replicas,again the replica with the maximum record history is selected and also awarning message may be logged to indicate a possible data loss. Aparameter can be provided to control warning or ABEND extract action inthis scenario.

Node Added/Boot Event

In accordance with an embodiment, when a new node is added to the ringor a node which was down comes up, there is a chance that Cassandrawould go about shuffling partition token ranges in the ring. If theextract component was reading data for some partition tokens from anexisting node and somehow these partition tokens were moved to anothernodes in the ring, the risk is that new data for such partition tokenswill be filtered out. The CassandraTopologyManager would take thefollowing actions to address this scenario: Update the deduplicationcache to check if any partition tokens are invalid due to the new nodein the ring. Open a connection to pull commit logs from the new node.Create position/checkpoint entry (in casschk.json) for the new node.

Remote Commit Log Transfer

In accordance with an embodiment, the class CassandraClusterSFTPClientis a SFTP client which is used to transfer the remote commit logs forprocessing. This class uses the JSch library. SSH connections are openedfor every live node in the cluster and commit logs are pulled into themachine where the extract component process is running. There will beone SSH connection per node on a separate dedicated thread. Theconnection is kept open until the node is part of the ring.

Cluster Positioning

In accordance with an embodiment, unlike RDBMS's where a single logsequence number would suffice to indicate a unique position tostart/restart/alter a capture, a Cassandra cluster with multiple nodeswould need to store one position per node. For an ‘n’ node ring, therewill be ‘n’ position(s). The class ClusterPosition stores the multi-nodeposition. This class also houses another class NodePosition which storesthe position for a single node. The class also stores the starttimestamp for extract. The position information is saved into a JSONfile. The position file is updated at a configurable interval (typicallyevery 10 seconds) and also during shutdown. The extract componentpositioning is based on this position file. This JSON position file canbe manually edited to position extract.

In accordance with an embodiment, an example position JSON file can beillustrated as:

  {  “start timestamp”: −1,  “sequence id”: 11774721,   “nodes”: [    {     “address”: “127.0.0.3”,     “file”:“CommitLog-6-1502860465106.log”,     “offset”: 411806,     “id”:1502860465106    },    {     “address”: “127.0.0.2”,     “file:“CommitLog-6-1502861017180.log”,     “offset”: 46621, “id”:1502861017180    },    {     “address”: “127.0.0.1”,     “file”:“CommitLog-6-1502860971369.log”,     “offset”: 226525,     “id”:1502861017180    }   ]  }

Polling for Data

In accordance with an embodiment, CDC process manager would continuouslylook for new data in the commit logs. The Java ScheduledExecutor serviceis used with a job frequency which is configurable.

Schema Loader

In accordance with an embodiment, a class SchemaLoader has the followingresponsibilities: Read the table wildcard entries from the Cassandra VAMmodule and expand the wildcards. Load the schema instance as required bya Cassandra client application. The CommitLogReadHandler interface willbe able to read commit log segment data only for tables and keyspacesloaded in the current schema instance of the client application, whichis the CDC process manager.

Reading from Active Commit Logs

In accordance with an embodiment, the format of the commit log segmentsin the active commit logs are similar to the CDC commit logs. Thisenables us to read and position into the active commit logs. Readingactive commit logs is a desirable feature as it reduces latency. It isalso risky as Cassandra database processes may truncate an active commitlog and re-use for future database changes. When an active commit log istruncated, the contents of the active commit will be moved to the CDCdirectory into a new commit log. This could also result in dataduplication but offers lower latency.

Data Types

In accordance with an embodiment, table 1 describes the data typemapping and data types supported for Oracle Golden Gate.

TABLE 1 Cassandra type GoldenGate type Length Support Comments UTF8TypeGG_DT_CHAR Variable Y UTF-8 encoded string. AsciiType GG_DT_CHARVariable Y US-ASCII character string. Int32Type GG_DT_INTEGER  4 Y32-bit signed integer. ShortType GG_DT_INTEGER  2 Y 2 byte integer.Integer Type GG_DT_INTEGER  8 Y Arbitrary-precision integer,   mapped toJava BigInteger. Long Type GG_DT_INTEGER  8 Y ByteType GG_DT_INTEGER  1Y 1 byte integer. DecimalType GG_DT_FLOAT  8 Y Variable-precisiondecimal. DoubleType GG_DT_FLOAT  8 Y 64-bit IEEE-754 floating   point.Float Type GG_DT_FLOAT  4 Y 32-bit IEEE-754 floating   point.SimpleDateType GG_DT_CHAR   Y yyyy-mm-dd Timestamp Type GG_DT_CHAR   Yyyyy-MM-dd:HH:mm:ss.SSS   +/− hh:mm Time Type GG_DT_CHAR   Y hh24:mi:ss.SSSSSS BooleanType GG_DT_BINARY  1 bit Y 0×1 (for true) ; 0×0 (forfalse) InetAddressType GG_DT_CHAR 45 Y ABCD:ABCD:ABCD:ABCD:ABCD:ABCD:ABCD:ABCD or ABCD:ABCD:ABCD:ABCD:A BCD:ABCD:192.168.158.190. IPaddress string in IPV4 or IPv6 format. UUIDType GG_DT_CHAR 36 Y8-4-4-4-12 [ex: 123e4567- e89b-12d3-a456- 426655440000]. A UUID instandard UUID format. TimeUUIDType GG_DT_CHAR 36 Y 8-4-4-4-12 [ex:123e4567- e89b-12d3-a456- 426655440000]. Timestamp based UUID. BLOBGG_DT_STORED_BLOB N/A Y Cassandra supports a maximum LOB size of 2 GB,recommended size is 1 MB. The trail is written with LOB chunk size of8000 bytes per chunk.

Transaction ID (TranID GGS Partition Token)

In accordance with an embodiment, there are no transactions inCassandra. Every operation record is enclosed in a pseudo transactionwith one operation record. This is for compatibility with the VAM API.In accordance with an embodiment, the transaction ID is constructed bythe concatenation of the node address, the commit log ID and the offsetwithin the commit log.

Sequence Number (Sequence ID or CSN GGS Partition Token)

In accordance with an embodiment, this will be a simple sequence numberstarting from 1. Every record written to the trail will have a uniquesequence number which increases in its value for every new recordwritten to the trail.

VAM Position->Context

In accordance with an embodiment, VAM data structure Position->Contextwill be populated with the same value as the transaction ID. This datais not used for positioning by the extract component. The extractcomponent positioning relies on the checkpoint information and the fullaudit recovery (FAR) logic for exact positioning.

Positioning and Recovery

In accordance with an embodiment, OGG Extract can be positioned tocapture as stated below:

Start reading all the available data from the beginning:

-   -   GGSCI>ADD EXTRACT cassvam, TRANLOG

Start from current timestamp:

GGSCI>ADD EXTRACT cassvam, TRANLOG, BEGIN NOW

Start from a given date and time:

-   -   GGSCI>ADD EXTRACT cassvam, TRANLOG, BEGIN 2017-03-27        23:05:05.123456

Restart from previous run:

In this instance, the Java module CDC process manager writes a JSONposition file under dirchk/casschk.json which has the information aboutpositioning to all the nodes in the cluster.

Positioning and Recovery Scenarios

TABLE 2 Property Value Transaction ID [NodeAddress]:[CommitLogID]:[Offset] Context Same as Transaction ID ContextDisplay Same as Transaction ID Operation timestamp Original timestamp ofthe operation. Transaction timestamp This will be a sequenced timestamp.If the new record's timestamp is lower than the previous record'stimestamp, the new record will be assigned the previous record'stimestamp. Sequence ID (Optional) This can be a simple sequence numberstarting from 1. Every record written to the trail will have a uniquesequence number.

In accordance with an embodiment, the capture process maintains acheckpoint information (e.g., an extended checkpoint file, such as aJSON checkpoint file, as illustrated above) to store the commit logpositions of all the nodes in the cluster.

In accordance with an embodiment, whenever the extract component/captureprocess issues a checkpoint complete event(GG_CONTROL_CHECKPOINT_COMPLETE), the Cassandra VAM module will updatethe checkpoint information to have the position of all the nodes in thecluster which are feeding data records.

In accordance with various embodiments, the teachings herein may beconveniently implemented using one or more conventional general purposeor specialized computer, computing device, machine, or microprocessor,including one or more processors, memory and/or computer readablestorage media programmed according to the teachings of the presentdisclosure. Appropriate software coding can readily be prepared byskilled programmers based on the teachings of the present disclosure, aswill be apparent to those skilled in the software art.

In some embodiments, the teachings herein can include a computer programproduct which is a non-transitory computer readable storage medium(media) having instructions stored thereon/in which can be used toprogram a computer to perform any of the processes of the presentteachings. Examples of such storage mediums can include, but are notlimited to, hard disk drives, hard disks, hard drives, fixed disks, orother electromechanical data storage devices, floppy disks, opticaldiscs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs,EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or opticalcards, nanosystems, or other types of storage media or devices suitablefor non-transitory storage of instructions and/or data.

The foregoing description has been provided for the purposes ofillustration and description. It is not intended to be exhaustive or tolimit the scope of protection to the precise forms disclosed. Manymodifications and variations will be apparent to the practitionerskilled in the art.

For example, although many of the features and techniques describedherein are illustrated using the example of capturing data from aCassandra database environment; in accordance with various embodiments,the features and techniques can be similarly used to capture data fromother types of distributed data source systems, databases, datastructures, or data streams, including, but not limited to, for exampleApache Cassandra, Kafka, MongoDB, Oracle NoSQL, Google Bigtable, DB2,MySQL, or HDFS.

Thus, from one perspective, there has been described a system and methodfor capture of change data from a distributed data source system, forexample a distributed database or a distributed data stream, andpreparation of a canonical format output, for use with one or moreheterogeneous targets, for example a database or message queue. Thechange data capture system can include support for features such asdistributed source topology-awareness, initial load, deduplication, andrecovery. A technical purpose of the systems and methods describedherein includes determination and communication of changes performed todata at a distributed data source that includes a large amount of dataacross a plurality of nodes, to one or more target computer systems.

The embodiments were chosen and described in order to best explain theprinciples of the present teachings and their practical application,thereby enabling others skilled in the art to understand the variousembodiments and with various modifications that are suited to theparticular use contemplated. It is intended that the scope be defined bythe following claims and their equivalents.

What is claimed is:
 1. A system for capture of change data from adistributed data source, for use with heterogeneous targets, comprising:a computer that includes a processor, and a change data capture processmanager executing thereon, wherein the change data capture processmanager is configured to capture change data from a distributed datasource, using a capture process, for use with one or more targets,including: determining a distributed source topology associated with aplurality of nodes in the distributed data source, wherein each node isassociated with a source change trace entity that records data changesthat are processed at that node; accessing the source change traceentities, to determine data changes at the distributed data source, foruse with the one or more targets; and monitoring for a presence of newnodes or unavailability of one or more nodes within the distributed datasource, whereupon a source node determined as being unavailable,selecting, from within a plurality of replica nodes at the distributeddata source, a replica node from which to obtain change data records. 2.The system of claim 1, wherein the distributed data source is one of adistributed database, or a distributed data stream, or other distributeddata source, and wherein the one or more targets include one or more ofa database, message queue, or other target.
 3. The system of claim 1,wherein the change data capture process manager performs a change datacapture process that converts the change data read from the distributeddata source, into a canonical format output of the change data, forconsumption by the one or more targets.
 4. The system of claim 3,whereupon based on a target system to which the change data will becommunicated, the canonical format output of the change data isconverted to a format used by the target system.
 5. The system of claim3, wherein the change data capture process manager enables support for anew target system to be provided by a pluggable adapter component thatreads the canonical format output of the change data and converts it toa format used by the new target system.
 6. The system of claim 1,wherein the change data capture process manager performs a deduplicationprocess that provides automatic deduplication of the data provided bythe distributed data source.
 7. The system of claim 1, wherein thechange data capture process manager performs automatic discovery of thedistributed source topology associated with the distributed data sourcesystem, and provides access to one or more distributed source changetrace entity at nodes of the distributed data source system.
 8. Thesystem of claim 6, whereupon a change to the distributed source topologyassociated with the distributed data source system, including one ormore nodes being added to or removed from the distributed sourcetopology, the deduplication process detects the change to thedistributed source topology.
 9. The system of claim 1, whereupon thechange data capture process manager determining that a particular nodein the distributed data source system, which had been providing records,becomes unavailable, the change data capture process manager performs arecovery process that selects a replica node at which to obtain records.10. The system of claim 1, wherein if there is more than one replicanode with a matching last record, a replica with the maximum recordhistory is selected to feed a partition token found in the last recordprocessed by the unavailable node.
 11. A method for capture of changedata from a distributed data source, for use with heterogeneous targets,comprising: capturing, by a change data capture process manager, changedata from a distributed data source, using a capture process, for usewith one or more targets; including determining a distributed sourcetopology associated with a plurality of nodes in the distributed datasource, wherein each node is associated with a source change traceentity that records data changes that are processed at that node;accessing the source change trace entities, to determine data changes atthe distributed data source, for use with the one or more targets; andmonitoring for a presence of new nodes or unavailability of one or morenodes within the distributed data source, whereupon a source nodedetermined as being unavailable, selecting, from within a plurality ofreplica nodes at the distributed data source, a replica node from whichto obtain change data records.
 12. The method of claim 11, wherein thedistributed data source is one of a distributed database, or adistributed data stream, or other distributed data source, and whereinthe one or more targets include one or more of a database, messagequeue, or other target.
 13. The method of claim 11, wherein the changedata capture process manager performs a change data capture process thatconverts the change data read from the distributed data source, into acanonical format output of the change data, for consumption by the oneor more targets.
 14. The method of claim 13, whereupon based on a targetsystem to which the change data will be communicated, the canonicalformat output of the change data is converted to a format used by thetarget system.
 15. The method of claim 13, wherein the change datacapture process manager enables support for a new target system to beprovided by a pluggable adapter component that reads the canonicalformat output of the change data and converts it to a format used by thenew target system.
 16. The method of claim 11, wherein the change datacapture process manager performs a deduplication process that providesautomatic deduplication of the data provided by the distributed datasource.
 17. The method of claim 11, wherein the change data captureprocess manager performs automatic discovery of the distributed sourcetopology associated with the distributed data source system, andprovides access to one or more distributed source change trace entity atnodes of the distributed data source system.
 18. The method of claim 16,whereupon a change to the distributed source topology associated withthe distributed data source system, including one or more nodes beingadded to or removed from the distributed source topology, thededuplication process detects the change to the distributed sourcetopology.
 19. The method of claim 11, whereupon the change data captureprocess manager determining that a particular node in the distributeddata source system, which had been providing records, becomesunavailable, the change data capture process manager performs a recoveryprocess that selects a replica node at which to obtain records.
 20. Themethod of claim 11, wherein if there is more than one replica node witha matching last record, a replica with the maximum record history isselected to feed a partition token found in the last record processed bythe unavailable node.
 21. A non-transitory computer readable storagemedium, including instructions stored thereon which when read andexecuted by one or more computers cause the one or more computers toperform a method comprising: capturing, by a change data capture processmanager, change data from a distributed data source, using a captureprocess, for use with one or more targets; including determining adistributed source topology associated with a plurality of nodes in thedistributed data source, wherein each node is associated with a sourcechange trace entity that records data changes that are processed at thatnode; accessing the source change trace entities, to determine datachanges at the distributed data source, for use with the one or moretargets; and monitoring for a presence of new nodes or unavailability ofone or more nodes within the distributed data source, whereupon a sourcenode determined as being unavailable, selecting, from within a pluralityof replica nodes at the distributed data source, a replica node fromwhich to obtain change data records.